Junghoon Chae, Purdue University, jchae@purdue.edu PRIMARY
Guizhen Wang, Purdue University, wang1908@purdue.edu
Benjamin Ahlbrand, Purdue University, bahlbran@purdue.edu
Mahesh Babu Gorantla, Purdue University, mgorantl@purdue.edu
Jiawei Zhang, Purdue University, zhan1486@purdue.edu
Siqiao Chen, Purdue University, chen1722@purdue.edu
Hanye Xu, Purdue University, xu193@purdue.edu
Jieqiong Zhao, Purdue University, zhao413@purdue.edu
William Hatton, United States Air Force Academy, C16william.hatton@usafa.edu
Abish Malik, Purdue University, amalik@purdue.edu
Sungahn Ko, Purdue University, ko@purdue.edu
David S. Ebert, Purdue University, ebertd@purdue.edu
Student Team: NO
Our customized visual analytics tool DinofunVis
Tableau
MS Excel
R
Gephi
We also applied our algorithms for clustering. Please refer to the Appendix at the bottom of this document.
Approximately how many hours were spent
working on this submission in total?
100 hours.
May we post your submission in the
Visual Analytics Benchmark Repository after VAST Challenge 2015 is complete? YES
Video Download
Video:
http://pixel.ecn.purdue.edu:8080/~zhan1486/VASTCHALLENGE15/GC.wmv
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Questions
For each of
the following questions, consider both the movement and communications data.
GC.1 – Scott is not a paying customer and does not have an ID. Describe Scott
Jones’ activities in the park during the three-day weekend. Who does he spend
most of his time with? When does he arrive? When does he leave? What route does
he follow?
Limit your
response to no more than 10 images and 1000 words.
GC.2 – Identify up to 8 issues with park
operations during the three-day weekend.
Provide a rationale for your answers.
1. The app the park provides is useful for both the visitors and the park administrators. One issue with the apps movement monitoring from the park administrators perspective is the difference between check-ins and movements. The check-in feature only applies currently to attractions and rides, and does not pertain to places for food, shopping, restrooms, or miscellaneous features. Thus, the classification of movement is not always true, and skews the data for movements to seem longer than they actually are. To better help clarify the type of data for park analysts, the app should record the time a person spends at one of these lesser features/attractions as a check-in so that any stop is indicated as such.
2. The park employees responsible for the pavilion made a couple of mistakes on Sunday. The pavilion was locked before Scott Jones’ performances, which meant the pavilion was closed each morning from 10:00 AM-11:00 AM (as can be seen in Figure 2-1).
On Sunday, however, the park employee responsible for the locking and unlocking procedure failed to unlock the pavilion until about 11:30 AM, giving the criminals another half hour to vandalize the area. Also, the employee locked the pavilion without checking the inside for any people remaining, because he left the three suspects (id: 416790, 461004, and 1502920) in there alone during the time it was locked.
Either the park employee was involved with the crime or he did not complete his job properly, depriving visitors of their time at the pavilion and providing the suspects with more time to commit their crime.
3. Several attractions had much longer check-in times likely due to a high popularity; thereby, creating a long line and forcing people to wait for large amounts of time. Our mapping tool shows several people spend large quantities of their time at certain rides. This can be observed from Figure 2-2 (Top, x-axis: attraction, y-axis: number of check-in). The attractions highlighted in this figure are thrill rides. Further, Figure 2-2 (Bottom) shows the average wait time for the thrill rides, where we find that the wait times for Attractions 4, 5, and 7 are exceedingly large.
Many people who attempt to enjoy a popular attraction can only experience that one ride in a whole hour of their time. Therefore, people trying to incorporate the most popular attractions into their day will struggle to have time remaining to see the entire park and experience it all. Thus, the less popular areas of the park are being wasted and not contributing to park earnings as much as they would if people had less wait time at the larger attractions. One way to mitigate for this issue is to have a restriction that a person cannot check-in within a given time. This policy can help balance the number of visitors among attractions.
4. The pavilion and the performance stage for Scott’s shows had only one entrance. Compared to the pavilion that houses valuable memorabilia, the stage area should have multiple entrances/exits connected to the path to alleviate the rush of people caused by a feature show such as Scott’s. Also, the areas would be more accessible if they had been built at the center of the park so people did not create a massive flow down to these regions when they enter the park or when the show starts as shown in Figure 1-2 with thick arrows.
5. One of the largest breaks from the norm are the people who check-in only at the entrance to the park. There are between 20 and 30 people throughout the weekend who simply just check-in at the park entrance and then fail to check-in at any rides or attractions as shown in the image below. For example, the IDs listed below are the customers who checked in to the park on Friday, but did not check in any other places. The recommendation would be that the park management should detect anomalous behaviors in order to alert them to potentially suspicious behavior, or just people in need of assistance.
6. With our sequence-based clustering technique (see Appendix for details), we identify various types of groups with different patterns in using attractions on different days as shown in Figure 2-3. For instance, the visitors in the largest cluster (6839 visitors) on Sunday prefer thrill rides (green) and kiddie rides (purple). However, individual attractions for thrill rides are spread across the park, which is not convenient for visitors. We recommend placing thrill rides closer to each other, unless their distribution across the park is to mitigate crowds and increase revenue from nearby refreshment stands and souvenir shops.
7. The park administrators also need to be cognizant of malicious people trying to disable their tracking devices. We identified one person with ID 392618 who may have tampered with his device. This person suddenly jumped to building 37 around 8:43pm as shown in own in Figure 2-4.
GC.3 – For the crime, describe the following, and
provide your rationale:
a. When did the
crime occur?
b. Where did
the crime take place?
c. Who are the
most likely suspects in the crime?
Limit your
response to no more than 5 images and 500 words.
a. Several factors indicate that the crime occurred between 10:00 AM and 11:30 AM on Sunday. The communication time series graph in Figure 3-1 shows the messages sent from the Wet Land area spiked unusually from 11:30 AM-12:00 PM. This is likely because the people checking-in to the pavilion after Scott’s performance found the vandalism and saw that the medal was stolen. Further, the check-ins to the pavilion end at 10:00 AM, when the park locks it for the morning show and then restarts at 11:30 AM. However, the last check-in is right before noon, and then after 12:00 PM, no one is allowed to check-in to the pavilion for the rest of the day. This reinforces the idea that the crime was discovered at 11:30 AM. Thus, we conclude that the crime occurred somewhere within this time frame (10:00 AM-11:30 AM).
b. From the news report, we learn that the crime occurred at the pavilion (building #32). This is confirmed by observing that the communications from the pavilion spike on Sunday, and that no one is allowed to check-in there the entire afternoon and evening (Figure 3-2).
c. Our goal in searching for suspects in the crime was to highlight suspicious behavior of individuals on Sunday. Since we narrowed down the crime time window and the place, we found that three IDs (416790, 461004, and 1502920) spent 2.5 hours in the pavilion, and were in there alone from 10:00 AM-11:30 AM. This time frame matches that of our hypothesis of when the crime occurred. Also, they were the only people in the pavilion between 10:00 AM and 11:00 AM on any other day of the weekend. Furthermore, Figure 3-3 shows that these three suspects checked into areas around Attraction 32 a lot, which indicates that they specifically moved around this area.
Figure 3-3: The trajectory and check-in hotspots of three IDs, 416790, 461004, and 1502920 on Sunday from 8am to 23pm.
Figure 3-4: Linkages between potential suspects and accomplices as derived from communication data.
Additionally, they communicated amongst each other as well during this time period, indicating that they were working together (Figure 3-4). Therefore, we are convinced that these three individuals are the most likely suspects of committing the vandalism and stealing the medal.
We also are considering persons with IDs 1123214, 1350546, 1000279, and 1187909 as possible accomplices. These four people were heavily involved in communications with the suspects, and even checked-in to the same rides throughout the day, with some overlapping times where the suspects may have handed off the medal to them. Note also that these seven individuals also talk to people outside the park, which may further indicate an external mastermind of the crime.
Finally, we note that people with IDs 1711922, 430595, 921888 could be possible accomplices or witnesses. These people stood outside the pavilion for long amounts of time (between 10-30 minutes), and could have either been guarding the entrance or just waiting for the pavilion to be unlocked. Note that IDs 1711922 and 430595 communicate heavily with the park security officials, which may indicate that they are part of the security staff (Figure 3-4). Either way, these three people should also be questioned.
Appendix
Clustering algorithm used for Figure 2-3: We implemented sequence-based clustering to group people based on check-in sequences in categories of attractions. In this approach, we first find the longest common subsequence (LCS) to measure the similarity of at least a two customers sequence. Then, we apply a density based clustering algorithm, DBSCAN to group customers.
Trajectory clustering algorithm (flow visualization with arrows)
We group the individual trajectories into classes of similar sub-trajectories using a trajectory clustering model based on the partition-and-group framework, enabling users to discover common sub-patterns, rather than just seeing common holistic patterns.